A Statistical Approach for Similarity Measurement Between Sentences for EBMT
نویسنده
چکیده
Success of Example-Based Machine Translation depends heavily on how efficient the retrieval scheme is. The more similar is the retrieved sentence to the input one, the easier will be the adaptation of the retrieved translation to the current requirement. However, there is no suitable scheme for measuring similarity between sentences. This paper reports preliminary results of a similarity measurement scheme that is based on a linear model , whose coefficients are determined by multiple regression technique. The data for the analysis has been collected from a survey of a number of respondents. Three major aspects of similarity, namely pragmatic, syntactic and semantic have been considered. Each respondent has been asked to evaluate the similarity between different pairs of sentences that are carefully designed to reflect one of the above types of similarity. A statistical analysis of these evaluations reveals general human perception about sentential similarity, which will help in designing a suitable retrieval scheme.
منابع مشابه
Searching Similar (Sub)Sentences for Example-Based Machine Translation
Translation is a repetitive activity. The attempt to automate such a difficult task has been a long-term scientific dream; in the past years research in this field has acquired a growing interest, making some forms of Machine Translation (MT) a reality. Among the several types of approaches in MT, one of the most promising paradigms is MAHT and, in particular, example-Based Machine Translation ...
متن کاملIdentifying Synonymous Expressions From A Bilingual Corpus For Example-Based Machine Translation
Example-based machine translation (EBMT) is based on a bilingual corpus. In EBMT, sentences similar to an input sentence are retrieved from a bilingual corpus and then output is generated from translations of similar sentences. Therefore, a similarity measure between the input sentence and each sentence in the bilingual corpus is important for EBMT. If some similar sentences are missed from ret...
متن کاملWord Selection for EBMT based on Monolingual Similarity and Translation Confidence
We propose a method of constructing an example-based machine translation (EBMT) system that exploits a content-aligned bilingual corpus. First, the sentences and phrases in the corpus are aligned across the two languages, and the pairs with high translation confidence are selected and stored in the translation memory. Then, for a given input sentences, the system searches for fitting examples b...
متن کاملUsing Example-Based MT to Support Statistical MT when Translating Homogeneous Data in a Resource-Poor Setting
In this paper, we address the issue of applying example-based machine translation (EBMT) methods to overcome some of the difficulties encountered with statistical machine translation (SMT) techniques. We adopt two different EBMT approaches and present an approach to augment output quality by strategically combining both EBMT approaches with the SMT system to handle issues arising from the use o...
متن کاملA novel method for detecting structural damage based on data-driven and similarity-based techniques under environmental and operational changes
The applications of time series modeling and statistical similarity methods to structural health monitoring (SHM) provide promising and capable approaches to structural damage detection. The main aim of this article is to propose an efficient univariate similarity method named as Kullback similarity (KS) for identifying the location of damage and estimating the level of damage severity. An impr...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2006